Audio enhancement system

ABSTRACT

A system for enhancing the sound signal produced by an audio system in a listening environment by compensating for ambient noise in the listening environment is provided. The system receives an electrical sound signal and generates a sound output therefrom. A total sound signal is sensed representative of the total sound level in the environment, where the total sound level includes both the sound output from the audio system and the ambient noise within the environment. The system extracts an ambient noise signal representative of the ambient noise in the environment from the total sound signal in response to the total sound signal and to a reference signal derived from the electrical sound signal. The system extracts the ambient noise signal using an adaptive filter with an adaptive step size. The system generates a control signal in response to the ambient noise signal and adjusts the sound output of the audio system to compensate for the ambient noise level in response to the control signal. The system calculates a step size for controlling the adaptive step size of the adaptive filter.

1 CLAIM OF PRIORITY

This patent application claims priority to European Patent Applicationserial number 05 009 835.9 filed on May 4, 2005.

2 FIELD OF THE INVENTION

This invention relates to a system for improving the sound reproduced byan audio system in a listening environment such as a vehicle and, inparticular, to a system which compensates for noise outside the audiosystem.

3 RELATED ART

When music or speech is reproduced, for example, in a vehicle, thesignal is typically corrupted by external acoustic noise present in thevehicle. This noise may result from and is dependent upon vehicle speed,road condition, weather and condition of the vehicle. The presence ofsuch noise results in a situation where soft sounds of interest arehidden, the perceived loudness is reduced and the intelligibility of thesignal is lessened. The vehicle driver and/or passengers may compensatefor increased external noise by increasing the volume of the audiosystem. However, when the vehicle speed decreases or another source ofexternal noise is alleviated, the audio system's volume will be toohigh, requiring the user to reduce it. To overcome this, U.S. Pat. Nos.5,434,922 and 6,529,605 propose an enhanced dynamic volume control (DVC)system which extracts the noise signal from a signal provided by asensor (e.g., a sensing microphone) in the listening environment andcalculates a control signal therefrom. This control signal is used tocontrol the volume and/or dynamics of the desired signal (e.g., themusic).

A DVC system extracts the noise signal from mixed signals derived from asensor (e.g., a microphone). The mixed signals comprise music components(i.e., the desired signal), voice components and noise components. Thenoise component is intended to be used solely for obtaining the controlsignal for the volume or dynamics of the desired signal. It is desiredthat the other components have no effect on the derivation of thecontrol signal. Otherwise the system would respond to voice signals orcontrol itself through the music, which results in a gain chasesituation (i.e., direct feedback). Such a gain chase situation couldlead to instability in the entire audio system.

The music signal is typically extracted from the sensor signal using anadaptive filter. The voice signal left in the remaining signal mixtureis then masked out using a “voice activity detector” (VAD). The VADoperates continuously in the time range—i.e., in a broadband manner—andis typically implemented by an averaging unit with predeterminedincrements and decrements. That is, as long as the input signal islarger than the output signal, the VAD increases its output signal by afixed increment, or reduces its output signal by a fixed decrementwhenever the input signal is less than the VAD output signal. In thisway, the VAD utilizes the different stationary properties of the voiceand noise signals. The noise signal is strongly (broadband) smoothed sothat the VAD output signal (i.e., the control signal), even if somewhatdelayed, reaches a stationary final value, which approximatelycorresponds to the average power of the noise signal in the sensorsignal. Depending on the configured volume or selected equalizing, whichmainly refers in this instance to the bass setting, the music signalmore or less penetrates the noise signal—that is, the louder the desiredsignal (e.g., the music signal) is played or the higher the basscontroller is set, the greater the share of the music signal that passesunfiltered through the adaptive filter. This can lead to the undesirablegain chase situation described above.

It has been found that the adaptive filter works relatively better(i.e., permits less of the desired music signal to pass through it) ifthe signals have a narrower bandwidth. For this reason, the DVC systemmostly works with strongly undersampled signals, which reduces theimplementation complexity, but leads to a control signal which isderived solely from the low-frequency noise component and is thenapplied in a broadband manner to control the volume or dynamics. Sincelow-frequency noise signals typically dominate in vehicles—the field forwhich most DVC systems are primarily designed—the spectral limitationdescribed above can only actually be considered in this context.Nonetheless, the solution is not fully ideal and could lead tooverlapping effects in certain circumstances, which is why a broadbandsolution is preferable. Although the risk of gain chase is reducedthrough limiting the bandwidth, it is not fully eliminated.

One way of completely avoiding gain chase is to upward-limit the controlsignal in accordance with the existing power of the desired signal,which is implemented in common systems in the form of an anti-gain chasefunction. This function permits the control signal, provided the desiredsignal is below a specific minimum threshold value, to pass through thefilter without being changed, but limits the control signal to a maximumvalue specified by another function if the power of the desired signalrises above the threshold, and blocks further control once the desiredsignal has exceeded a maximum threshold—i.e., the control signal is thenreplaced by zero. The control signal modified in this way can then beused to alter the volume and/or dynamics of the desired signal using acompressor. However, the control signal is, nevertheless, fullydependent on the mean power of the currently existing noise signal butdoes not consider its spectral distribution or coloring.

In this regard, “dynamic equalizer control” (DEC) systems are consideredsuccessors to DVC systems. However, one aspect that hinders thetransition from DVC to DEC systems is the limited bandwidth with whichDVC systems work. The reason why the bandwidth is limited is primarilyto lessen the risk of gain chase and additionally to reduce theimplementation costs.

It is already known from the DVC system how to calculate the requiredsignal from a microphone signal for use as a loudness control value(i.e., volume). However, it is also evident that a straightforwardvolume control does not represent the most desirable solution because itdoes not consider the spectral distribution of the noise signal. Methodsin both the time and spectral domains are known with which the spectralcoloring of the noise can be integrated into the processing of therequired signal using adaptive equalizing as illustrated above in theDEC system. In particular, the problems associated with noisy voicesignal components in the microphone signal are given special attention.However, other difficulties relating to both the DVC and DEC systems arealso disregarded as before.

What is needed is an improved system which automatically compensates forthe noise level in a listening environment in the time domain.

SUMMARY OF THE INVENTION

A system for enhancing the sound signal produced by an audio system in alistening environment by compensating for ambient noise in the listeningenvironment is provided. The system produces an electrical sound signaland generates a sound output therefrom to obtain a total sound signalrepresentative of the total sound level in the environment. The totalsound level includes both the sound output from the audio system and theambient noise within the environment. The system extracts an ambientnoise signal representative of the ambient noise in the environment fromthe total sound signal and to a reference signal derived from theelectrical sound signal. The system may extract the ambient noise signalusing an adaptive filter with an adaptive step size. The systemgenerates a control signal in response to the ambient noise signal andadjusts the sound output of the audio system to compensate for theambient noise level in response to the control signal. A step size iscalculated for controlling the adaptive step size of the adaptivefilter.

The other systems, methods, features and advantages of the inventionwill be, or will become, apparent to one with skill in the art uponexamination of the following figures and detailed description. It isintended that all such additional systems, methods, features andadvantages be included within this description, be within the scope ofthe invention, and be protected by the following claims.

DESCRIPTION OF THE DRAWINGS

The invention can be better understood with reference to the followingdrawings and description. The components in the figures are notnecessarily to scale, emphasis instead being placed upon illustratingthe principles of the invention. Moreover, in the figures, likereference numerals designate corresponding parts throughout thedifferent views.

FIG. 1 is a block diagram illustration of an adaptive FIR filter with anadaptation step size;

FIG. 2 is a block diagram illustration of a four-speaker/one-microphonearrangement;

FIG. 3 is a block diagram illustration of a DVC system with an improvedcomposition of the additive reference signal;

FIG. 4 is a block diagram illustration of a DVC system that includes theroom transfer functions;

FIG. 5 is a block diagram illustration of a mixture of the two systemsillustrated in FIGS. 3 and 4;

FIG. 6 is block diagram illustration of the use of a whitening filter inplace of a simple high-pass filter for spectral smoothing of the noisesignal;

FIG. 7 is a flowchart diagram of a voice activity detection model;

FIG. 8 is a block diagram illustration of an arrangement of amulti-channel DVC/DEC system;

FIG. 9 is a block diagram illustration of the structure of an equalizingfilter;

FIG. 10 is a block diagram illustration of an exemplary DVC/DEC system;

FIG. 11 is a block diagram illustration of an exemplary DVC/DECalternative system;

FIG. 12 is a block diagram illustration of a complex exemplary DVC/DECsystem;

FIG. 13 is a block diagram illustration of the generation of a referencesignal and a reference volume;

FIG. 14 is a graph that illustrates the characteristics of a gaincorrection function;

FIG. 15 is block diagram illustration of an exemplary implementation ofa gain correction function;

FIG. 16 is a graph that illustrates the characteristics of an activationthreshold; and

FIG. 17 is a block diagram illustration of an implementation of aminimum noise activation threshold function.

DETAILED DESCRIPTION

In all the adaptive filters presented thus far, which have the effect ofremoving the desired signal (the source signal, e.g., music, speech, etcetera) as much as possible from the microphone signal, a fixedadaptation step size expressed by μ was used. This does not representthe most desirable situation, but instead reflects a compromise betweenstability and adaptation speed. Provided it can be assured that thedesired signal is above the noise signal, using a fixed adaptation stepsize is acceptable. However, should the opposite situation apply to thesignals, even if only for a brief period, the estimated loudspeaker/roommicrophone (LRM) transfer function is corrupted.

As a consequence, the estimated noise level incorrectly rises due to thedesired signal components that are now considered. This situation isunacceptable because it can lead to problems in unfavorablecircumstances. The reason for the exhibited behavior is the desiredsignal (e.g., the sound signal) serves as a test (or reference) signalwith which the LRM transfer function is estimated using the adaptivefilter. If the noise signal component is now greater than the desiredsignal component in the microphone signal, the room can no longer becorrectly analyzed. The result is an LRM model comprising fully randomlyassembled filter coefficients that cannot be iteratively reordered untilthe signal for measuring the room is above the noise signal. Thisrequires some time to occur, during which the estimated transferfunction is incorrect, and which constitutes a relatively major error inthe removal of the desired signal component from the microphone signal.

The behavior of the adaptive filter in such situations is as follows:once the noise component becomes greater than the required signalcomponent, the current LRM estimation—which exists as a set of filtercoefficients—is frozen until the correct relationships apply once more.It is assumed that the LRM system changes relatively slightly or not atall during this time, as proved in practice. Moreover, the currentadaptation success, as represented by the value of the system distance,has to be considered along with the present signal-to-noise ratio (SNR)value.

The system distance indicates the extent to which the current modeldiffers from the one that actually exists, that is, it gives a measureof how much the adaptive filter has already approached the desiredvalue. If the distance between the estimated and existing acousticmodels is small, then the system distance is likewise small.Consequently, the adaptation step size is reduced to approach thedesired model even more closely and to increasingly consolidate themodel to be more resistant to brief occurrences of noise.

To estimate the system distance, a defined delay is introduced, forexample, into the microphone signal branch. The delay is estimated bythe adaptive filter using delayed filter coefficients. If the adaptivefilter is properly configured, the coefficients assume the value zero.If the filter is instead improperly configured, the coefficients assumea non-zero value that varies in a direct relationship to the existingnoise, thereby enabling conclusions regarding both the success of theadaptation and the system distance to be made. Generally, what isrequired is a small number of coefficients whose sum provides a valuefor the current system distance.

Referring to FIG. 1, an adaptive finite impulse response (FIR) filterwith an adaptation step size, μ[n], includes an FIR filter core 1controllable by a Least Mean Square (LMS) adaptation unit 2. The LMSunit 2 receives as an input signal a source signal x[n] from a signalsource 3 (e.g., a CD player, radio, etc.), an error signal e[n] from asubtraction unit 4, and an adaptation step size μ[n] from a step sizecalculation unit 5. The step size calculation unit 5 may be a meansource signal Mean X[n], an estimated system distance Dist[n] receivesupplied by the FIR filter core 1, and a mean error signal Mean E[n].The signals Mean X[n] and Mean E[n] are each provided by a correspondingaveraging unit 6, 7, respectively. The subtraction unit 4 furtherreceives via a delay unit 8 a signal from a microphone 9, which is partof a loudspeaker-room-microphone (LRM) system 10. The LRM system 10 islocated in a room 11 having a transfer function H_(Room)(Z), in whichthe microphone 9 and a loudspeaker 12 supplied with the source signalx[n] are arranged. The microphone 9 receives acoustic signals from theloudspeaker 12 filtered by the transfer function H_(Room)(z), along withnoise signals from at least one noise source 13 outside (as shown)and/or inside (not shown) the room 11.

To calculate the SNR value, both the source signal x[n] and theresulting output signal of the adaptive filter—that is, the error signale[n] are recursively determined using a simple first order InfiniteImpulse Response (IIR) low-pass filter.

The estimated adaptation step size, μ[n], can be calculated as follows:$\begin{matrix}{{\mu\lbrack n\rbrack} = \frac{{{MeanX}\lbrack n\rbrack}*{{Dist}\lbrack n\rbrack}}{{MeanE}\lbrack n\rbrack}} & (1)\end{matrix}$where the estimated system distance is given by the following equation:$\begin{matrix}{{{Dist}\lbrack n\rbrack} = {\frac{1}{N_{t}}*{\sum\limits_{n = 1}^{N_{t}}\quad{{w\lbrack n\rbrack}}}}} & (2)\end{matrix}$

The expression $\frac{{MeanX}\lbrack n\rbrack}{{MeanE}\lbrack n\rbrack}$in equation 1 corresponds to the ratio of the signal-to-noise ratio(SNR) value to the sampling time n. Thus, equation 1 can also beexpressed as follows:μ[n]=Dist[n]*SNR[n]  (3)which represents a relatively desirable adaptation step size at thesampling time n.

The use of the DVC system is usually restricted to the lower spectralrange, but even this range is generally not emitted solely from oneloudspeaker, but from a number of loudspeakers located at differentpositions within the listening room. As a consequence, the source signalplayed through the considered loudspeaker, representing the referencesignal by, x_(x)[n], which is folded with the room impulse responsebetween the source signal and the microphone, affects the compositesignal recorded by the microphone.

FIG. 2 is an example of a four-speaker/one-microphone arrangement. Thearrangement comprises a microphone 14 and four loudspeakers 15-18arranged in the front left (FL), front right (FR), rear left (RL), andrear right (RR) of a room 19, which is in the present exemplary case isthe interior of a vehicle. Between each of the loudspeakers 15-18 andthe microphone 14 is a room transfer function H₁(z), H₂(z), H₃(z), andH₄(z), respectively. Although in known DVC systems the different sourcesignals are computed through simple addition and the resulting compositesignal is used as a reference signal x[n] for the adaptive filter, theeffects of the various room transfer functions still remainunconsidered. For example, if a system comprising four loudspeakers andone microphone as in the present example is considered, all four roomtransfer functions H₁(z), H₂(z), H₃(z), and H₄(z) between theloudspeakers 15-18 and the microphone 14 are considered. Until now, asingle adaptive filter was used to remove the required (additive) signalfrom the microphone signal. In that situation, the adaptive filter diddo no more than reflect the mean value of the four existing transferfunctions.

The transfer functions with the strongest signal component, whichnormally stem from the loudspeakers nearest the microphone, have greaterimpact on the mean calculation than the weaker functions, derived mostlyfrom loudspeakers positioned further away. This system functionsrelatively well on average, but still appears to be inadequate for abroader range of applications. To improve the system, the system of FIG.2 having the four different loudspeakers 15-18 and the microphone 14 isconsidered. The composition of the resulting composite reference signalx[n] can be improved by considering in the calculation of the additivereference signal at least the different signal delay times andamplitudes resulting from the different distances of the loudspeakers15-18 from the microphone 14. Beginning with the nearest loudspeaker tothe microphone 14, which is taken as a reference point, the additional(acoustic) signal delays and attenuations in relation to the referencepoint are determined.

FIG. 3 illustrates a DVC system using an improved composition of thecomposite reference signal. The system of FIG. 3 comprises themicrophone 14 (as in FIG. 2) connected to an analog-to-digital (AD)converter 20 for converting an analog signal from the microphone 14 intoa digital signal. The digital signal is input to a low pass (LP) filter21 serving as an anti-aliasing filter for a subsequent sample rateconversion unit 22 which changes the sample rate by, for example, 16.The composite reference signal x[n] which is the sum of individualsource (reference) signals x₁[n], x₂[n], x₃[n], and x₄[n], each suppliedto the respective loudspeaker 15-18, is provided by an adder unit 23.The source signals x₂[n], x₃[n], and X₄[n] each are supplied to theadder 23 via a delay 24 (“Delay 2”), 25 (“Delay 3”), and 26 (“Delay 4”),respectively, and a corresponding attenuation unit 28 (“Mag2”), 29(“Mag3”), and 30 (“Mag4”), respectively. The source signal x₁[n] issupplied to the adder 23 directly via an attenuation unit 27 (“Mag1”).

An AD converter 28 is connected to the output of the adder unit 23 forconverting the analog additive reference signal x[n] into a digitalsignal which is fed into a LP filter 29 serving as an anti-aliasingfilter for a subsequent sample rate conversion unit 30. The outputsignal of the sample rate conversion unit 30 is fed via a delay unit 31into an FIR filter core 32 and a LMS filter control unit 34 forcontrolling the FIR filter core 32. The output of the FIR filter core 32is connected to one input of a subtraction unit 33, while a second inputof the subtraction unit 33 is connected to the output of the sample rateconversion unit 22. The subtraction unit 33 outputs the error signale[n], which is fed back to the filter control unit 34 for a comparisonwith the delayed reference signal. The error signal e[n] may also besupplied to other units such as, for example, a predictive error filteras illustrated in FIG. 6.

In FIG. 3, each component of the reference signal is considered as adelay or an amplification or an attenuation factor in the composition ofthe reference signal. Subsequently, the basic delay inherent in the pathfrom the reference loudspeaker 15 to the microphone 14, which isreferred to as “Delay1” or (“bulk delay”), can be considered within theundersampled signal processing range to increase the effective length ofthe adaptive filter. In doing so, an attempt is made to move the longdelays into the undersampled range as much as possible to reduce memoryconsumption and to equalize only the differences in delay at the fullsampling frequency using delay lines.

At the same time, existing symmetries can be utilized—for example, ifthe paths of the two front (FL, FR) and the two rear (RL, RR)loudspeakers are identical. In such cases, first the symmetrical signalsmay be added together and then the composite reference signal is adaptedto the corresponding second adaptive signal in terms of delay time andamplitude. The effective length of the adaptive filter is increased byincluding the bulk delay in the calculation since the adaptive filterwould otherwise equalize the acoustic signal delay time associated withthe loudspeaker nearest to the microphone (e.g., the loudspeaker 15 inFIG. 2). This intrinsic, acoustic signal delay time can also be usedadvantageously to determine the delayed coefficients. Using thistechnique, the delay line on the microphone signal path can be omittedunder certain circumstances. Nonetheless, this enhancement does notyield the most desirable system. To do so, all input signals must beseparately considered, resulting in the system illustrated in FIG. 4.

FIG. 4 is a schematic diagram of a DVC system that includes all roomtransfer functions. As in FIG. 3, the system of FIG. 4 comprises asignal path (e.g., a microphone path) established by the microphone 14,the AD converter 20, the low-pass filter 21, and the sample rateconversion unit 22. Further, the signal path of FIG. 3 established bythe AD converter 28, the low-pass filter 29, the sample rate conversionunit 30, the delay unit 31, the FIR filter core 32, the filter controlunit 34, and the subtraction unit 33 is realized in FIG. 4 by a path(reference path 1) comprising an AD converter 128, a low-pass filter129, a sample rate conversion unit 130, a delay unit 131, an FIR filtercore 132, a filter control unit 134, and a subtraction unit 133. The ADconverter 128 receives the reference signal x₁[n] (instead of referencesignal x[n] of FIG. 3), and the subtraction unit 133 provides an errorsignal e₁[n] from the processed microphone signal and the processedreference signal x ₁[n].

In contrast to the system of FIG. 3, the system of FIG. 4 comprisesthree additional reference paths (paths 2, 3, 4) established each by ADconverters 228, 328, 428; low pass filters 229, 329, 429; sample rateconversion units 230, 330, 430; delay units 231, 331, 431; FIR filtercores 232, 332, 432; filter control units 234, 334, 434; and subtractionunits 233, 333, 433. Each of the AD converters 228, 328, 428 receivesthe respective reference signal x₂[n], x₃[n], X₄[n], and the subtractionunits 233, 333, 433 provide error signals e₂[n], e₃[n], and e₄[n] fromthe error signals e₁[n], e₂[n], and e₃[n] and the processed referencesignals x₂[n], x₃[n], and x₄[n], respectively. The reference paths 1, 2,3, and 4 are identical at least in structure.

The reason the individual room transfer functions are used asillustrated in the system of FIG. 4 (i.e., with linking of theindividual adaptive filters and not in the conventional, parallelfashion) is explained by the relatively strong correlation of the inputsignals. The individual room transfer functions may be estimated inparallel independently of each other if the interrelations of the inputsignals x ₁[n], . . . , x₄[n] are non-correlated.

However, since there is a relatively strong correlation between theinput signals, the system functions as follows: the first adaptivefilter (in reference path 1), which is fed with the first undersampledreference signal, (x₁[n]), through the delay line 131 (“Delay1”),removes all signal components from the microphone signal that arecorrelated with the reference signal, x₁[n]. Ideally, the error oroutput signal, e₁[n], of the first adaptive filter only contains signalcomponents no longer correlated with the reference signal, x₁[n]. Thesecond adaptive filter (in reference path 2) can adapt only if there arecomponents in the error signal of the first adaptive filter, e₁[n], thatare correlated with the second reference signal, x₂[n]. Otherwise, thesecond adaptive filter transfers the signal that is also not correlatedwith the reference signal, x₂[n], as its output signal, and so forth. Atthe end of the chain, a signal is correspondingly output that iscorrelated with none of the four reference signals, x₁[n], x₂[n], x₃[n],x₄[n]. All that remains is a signal mixture comprising the speech signaland background noise, represented in FIG. 4 as “Noise[n]”.

Normally when using music as a reference signal source, a relativelystrong correlation can be noted between the individual reference signalsx₁[n], . . . , x₄[n]. For this reason, relatively little signal remainsafter the first adaptive filter for which it may be worthwhile analyzingthe remaining reference signals for correlations using adaptive filters.Therefore, perhaps the entire effort may not be worthwhile in practicalterms. In terms of signal theory, the system of FIG. 4 is valid.However, when viewed economically, the system of FIG. 3 is perhaps morelikely to be adopted. Also conceivable is a mixture of the two systemsof FIGS. 3 and 4, which may represent a viable compromise.

FIG. 5 illustrates such a mixture of the two systems of FIGS. 3 and 4.The system of FIG. 5 has a microphone path and the reference paths 1, 2as illustrated in the system of FIG. 4. However, path 2 (i.e., ADconverter 228) is not supplied with the reference signal x₂[n] as inFIG. 4, but is instead supplied with a mean reference signal similar tothat of FIG. 3. In the system of FIG. 5, the mean reference signal isprovided by an adder unit 123 receiving the processed reference signalsx₂[n], x₃[n], x₄[n]. The processing is such that the reference signalx₂[n] is provided to the adder unit 123 via an attenuation unit 127,while the reference signals x₃[n], x₄[n] are provided via theattenuation units 29, 30 and the delay units 25, 26, respectively. Theadditive error signal “Noise[n]” is output by the subtraction unit 233.

One of the known DVC alternative models has, at least optionally, ahigh-pass filter arranged after the adaptive filter. The function of thehigh-pass filter is to smooth the noise spectrum, which is known to havevery low frequencies in vehicles, to such an extent that the DVC systemcan respond not only to the predominant engine and movement noises butalso to high-frequency wind noise, which has much weaker signal levels.Even though wind noise has considerably weaker levels than those of thelow-frequency components, such noise is found in a spectral region inwhich human hearing is particularly sensitive, and therefore even a lownoise signal level requires a control mechanism.

Until now, a high-pass filter has been used to smooth the noise spectrumas much as possible so that the system can respond practically equallywell to both low-frequency noise and the high-frequency wind noisesignals just described. The high-pass filter has been set to a fixedvalue and then left unchanged. This setting is typically adequate forthe majority of cases, as demonstrated in many tests. However, there arecertain noise situations in which the high-pass filter may act poorly;that is, the filter does not appreciably smooth the spectrum of thenoise signal, with the effect that there are different responses tonoise signals with differing spectral distribution.

To overcome this, the spectral smoothing filter may be set adaptively inresponse to the currently applicable noise situation, where linearpredictive coding (LPC) analysis is utilized by feeding a predictorerror filter (also known as a pre-whitening filter) with the iterativelycalculated reflection coefficient. The system is also able to configurethe equalizing filter according to the pertaining noise situation insuch a way that the output signal of the predictor error filter isdistributed across the spectrum as white noise as much as possible.

FIG. 6 illustrates the use of a whitening filter in place of a simplehigh-pass filter for spectral smoothing of a noise signal, where onereference path (reference path 1 of FIG. 5) is used for explanationpurposes in FIG. 6. However, the use of a whitening filter can easily beadapted for the other reference paths. In the system of FIG. 6,high-pass filters 35 and 135 are included in the microphone path and thereference path (reference path 1), respectively, after the correspondingsample rate conversion units. Further, the error signal e[n] provided bythe subtraction unit 133 is fed into a predictive error filter 36controlled by a linear predictive coding (LPC) unit 37 that alsoreceives the error signal e[n]. An effective technique of doing this inthe time range is to compute the LPC coefficients using LPC analysis,and to input the coefficients into a predictor filter with which theerror signal is weighted.

It is known from other applications such as for example, acoustic echocancellation (AEC), that echo coefficients are calculated from the voicesignal in those cases using LPC analysis, and that the coefficients arethen fed into a predictor error filter for filtering the microphonesignal. The voice signal recorded by the sensing microphone isspectrally smoothed so that the AEC system can work more efficiently.The AEC application therefore refers to a pre-whitening filter whoseeffects can be described as follows: the LPC analysis suppliescoefficients that are used in a corresponding predictor filter thatcalls a transfer function corresponding to the power spectral density(PSD) of the signal (i.e., voice signal) examined by the analysis. Whenused in a predictor error filter, its transfer function does not followthe PSD of the analysis signal, but rather its inverse pattern. For thisreason, the voice component in the microphone signal on which the LPCanalysis is based is spectrally smoothed after the predictor errorfilter, that is, it is whitened (thus giving rise to the term“pre-whitening filter”).

The linear predictive coding is a relatively efficient and effectivetechnique, for example, to achieve synthetic speech and speech signalcommunication. The efficiency of the method is due to the speed of theanalysis algorithm and to the relatively low bandwidth required for theencoded signals. The effectiveness is related to the intelligibility ofthe decoded vocal signal. LPC implements a type of vocoder which is ananalysis/synthesis scheme where the spectrum of a source signal isweighted by the spectral components of the target signal that is beinganalyzed. The phase vocoder is, for example, a particular type ofvocoder where amplitude and phase information of the analysis channelsis retained and can be used as coefficients for an error filter.

In standard LPC, the source signals are either white noise or a pulsetrain, thus resembling voiced or unvoiced excitations of the vocaltract, respectively. The basic assumption behind LPC is the correlationbetween the n-th sample and the p previous samples of the target signal.Namely, the n-th signal sample is represented as a linear combination ofthe previous P samples, plus a residual representing the predictionerror. The analysis problem is equivalent to the identification of thecoefficients, a, of an all-pole filter. When reducing the error in amean square sense, the problem translates into a set of P equationswhich, in the z domain (frequency domain), reduce to E(z)=A(z)X(z),where A(z) is the polynomial with coefficients a and E(z) is theprediction error. In the case of voice signal analysis, the filter1/A(z) is called the all-pole formant filter because, if the properorder p is chosen, its magnitude frequency response follows the envelopeof the signal spectrum, with its broad resonances called formants. Thefilter A(z) is called the inverse formant filter because it extractsfrom the voice signal a residual resembling the vocal tract excitation.A(z) is also called a whitening filter because it produces a residualhaving a flat spectrum.

However, there are two kinds of residuals, both having a flat spectrum:the pulse train and the white noise, the first being the idealizedvocal-fold excitation for voiced speech, the second being the idealizedexcitation for unvoiced speech. In reality, the residual is neither oneof the two idealized excitations. At the re-synthesis stage the choiceis either to use an encoded residual or to choose one of the twoidealized excitations according to a voiced/unvoiced decision made bythe analysis stage.

When the target signal is periodic (voiced speech), a pitch detector canbe added to the analysis stage, so that the re-synthesis can be drivenby periodic replicas of a basic pulse, with the correct inter-pulseperiod. Several techniques are available for pitch detection, eitherusing the residual or the target signal. Although not particularlyefficient, one possibility is to perform a Fourier analysis of theresidual and estimate the fundamental frequency by the techniques ofsection. For example, the information extracted by the analysis stageare the prediction coefficients a₁, . . . , a_(p); the residual e; thepitch of the excitation residual; the voiced/unvoiced information; andthe signal energy (RMS amplitude). These parameters, possibly modified,are used in the re-synthesis stage, for example, by use of thewell-known Levinson-Durbin recursion, which provides the reflectioncoefficients of the lattice realization of the filter 1/A(z).

An effective technique of doing this is to compute the LPC coefficientsusing LPC analysis, and to input the coefficients into a predictorfilter with which the music signal is weighted. Known DVC systemstypically use a simple voice activity detection (VAD) function thatmakes use of the long-term steady state of the background noise signaland the short-term steady state of the human voice with the aid of twomean value calculation units. This VAD variant is nonetheless relativelydifficult to control, and for this reason a new VAD model that is easierto control and also even simpler to implement is preferred.

FIG. 7 illustrates the signal flow diagram for a VAD model. The VADmodel examines whether the current noise value (Noise[n]) is greaterthan the previous noise level (NoiseLevel[n]). If this is the case, thenew noise level (NoiseLevel[n+1]) is calculated from the previous noiselevel such that the new noise level increases by a predeterminedincrement, regardless of how much the current noise signal value isgreater than the previous noise level.

This approach suppresses relatively strong effects of the voice signalcomponents within the noise signal on the noise level, and even withoutany delay since the VAD model according to an aspect of the inventionhas no memory function, unlike known models. If the current noise signalis less than the previous noise level, the new noise level value isreduced by a predetermined decrement. The increment is generally definedto be less than the decrement so that there is no agitated response tobrief energy spikes in the noise signal. The noise level is restrictedto a low limit value using the low threshold value of the noise level“MinNoiseLevel”. The noise level is prevented from falling below aspecific level in this way. Without a low threshold value of this kind,the system would at times respond slowly to a rising noise signalbecause the rising speed (which is governed by the increment) is slowerthan the falling speed.

The DVC systems previously described are operated with one microphone.As such, the noise situation is recorded at the mounting point of themicrophone. This information is then used to develop a global controlledvariable for the entire vehicle. However, a number of aspects are notconsidered, which can lead to errors. For example, if the microphone ispositioned where standing waves are located in the interior of thevehicle, the system can respond inadequately or not at all to noisesignals at exactly the same frequencies. Ideally, the microphones arepositioned at the position of the passenger's head, because this noisesituation at every position affects the passenger's auditory perception,which is intended to be enhanced using DVC and DEC systems.

Thus, the microphone is integrated in the headrest, for example, toensure the microphone is positioned as close as possible to the bestpoint of auditory perception. The same challenge applies to all otherpassengers in the vehicle. Even if a microphone is placed in the idealposition for the driver, this does not automatically mean that thisarrangement produces good results at the remaining positions in theinterior of the vehicle. Each listening position requires its ownmicrophone for the system to provide equally good results. However, itis relatively complicated to obtain an effective control system as eachloudspeaker would have to provide a contribution for each microphoneposition to acquire the desired control effect at the same time at everylistening point.

FIG. 8 illustrates an example of an arrangement of a multi-channelDVC/DEC system in a room 19. The relatively simple technique forobtaining an individually adapted control system is to deploy theloudspeakers 15-18 located nearest the associated microphones 515-518for control of the amplitude through the DVC system and control of theequalization through the DEC system. As such, only the spectral range ofa subwoofer 500 is then controlled globally. This is however readilyacceptable as the range cannot generally be localized, and the human earis relatively insensitive in this frequency range to variations insignal level.

The method using LPC analysis, which can be efficiently implemented, forexample, using the gradient adaptive lattice (GAL) algorithm, is inprinciple an effective technique of implementing dynamic equalizationwith adaptation to the power density of the noise signal. In principle,any voice signal components present in the noise signal can for the mostpart be excluded using a relatively slow update speed for the GALalgorithm. However, some problems remain which are illustrated hereintogether with their associated solution.

If the LPC analysis is applied to the noise signal isolated by theadaptive filter using the GAL algorithm with a correspondingly slowupdate speed, the reflection coefficients deriving from the analysis areset in such a way that when included in a predictor filter (all-polefilter), the transfer function of the filter more or less exactlymatches the characteristics of the noise signal's power spectrum density(PSD), depending on the predictor order selected. The LPC analysisreturns reflection coefficients whose values are based exclusively onthe PSD development of the analyzed noise signal, but that do notprovide any information on the signal's amplitude relationships.Furthermore, the amplitude of the predictor filter is relatively high tobe ever directly used.

To solve these problems, the transfer function of the predictor filteris continually scaled such that its maximum value is as close to 0 dB aspossible. From the known scaling standards, i.e., the L1, L2 and L∞standards, the L1 standard is utilized since it is relatively easy tocalculate and also represents the relatively hard scaling type.$\begin{matrix}{{Scale} = {\sum\limits_{n = 0}^{\infty}\quad{{f\lbrack n\rbrack}}}} & (4)\end{matrix}$Equation 4 is a formula for calculating the scaling factor according tothe L1 standard. As can be seen from equation 4, an infinite number ofvalues of the filter's impulse response is required in theory tocalculate the scaling factor exactly according to the L1 standard, whichis however impossible in practical terms. In view of this, restrictionis made to the assumption of a limited impulse response, which is thenused to calculate the scaling factor (“scale”) according to the L1standard. The length of the impulse response is defined on the basis ofboth the error that is made when calculating the scaling factor and ofthe update time with which is desired or needed to generate a scalingfactor applicable to the next impulse response interval. To keep theupdate interval as short as possible, it is advisable to perform thecalculation (at least for the impulse response) using the originalsampling frequency f_(s), and not with the reduced sampling frequencyf_(s)/R.

Once it has been ensured that the predictor filter's maximum value is 0dB (i.e., it is scaled to 0 dB), it is still to decide to what extentthe equalizing or predictor filter is permitted to influence therequired useful signal. This decision is mainly dependent on the currentnoise level that has already been determined using the DVC system. Thisimplies that it is practical to couple the DEC and DVC systems together.Nevertheless, a determination is to be made as to the control of theequalizing filter on the basis of the existing noise level or in whatstructure it can be applied.

FIG. 9 illustrates the structure of an equalizing filter. Peakingfilters may be used for the implementation of the equalizing (EQ)filter. In peaking filters, a gain control can be used to set how farthe implemented EQ filter is to extend beyond the reference amplitude,which is also normally at 0 dB. This structure is chosen for the EQfilter, that is, the gain from the DVC system indicates how much themaximum spectral share of the EQ filter scaled (by “Scale”) to 0 dB isto be higher than the reference amplitude. In this way, only thespectral components that are particularly strongly concealed bybackground noise are amplified along with the DVC gain. This has thesame effect as dynamic equalizing, that is, a DEC system. The gainderived from the DVC system does not have to be accepted on a 1-to-1basis, rather it can be scaled at will for the DEC system or changed inany other way to obtain a satisfactory control effect. The filterillustrated in FIG. 9 comprises a source 40 connected to a subsequentadder unit 41 as well as a scaling unit 42. An equalizing (EQ) filter 43and a subsequent gain adjusting unit 44 are connected downstream fromthe scaling unit 42. The output signal provided by the gain adjustingunit 44 is fed into the adder unit 41, whose output signal is provided(via units not shown) to a loudspeaker 45.

FIG. 10 is an example of a relatively simple DVC/DEC system. Inaddition, the new gain value can be calculated together with thepreviously determined scaling factor to produce a further version of theDVC/DEC system. The filter of FIG. 9 is applied to system of FIG. 10such that the filter is not connected directly to the source but througha gain adjusting unit 46 controlled by a DVC unit 47. The DVC unit 47 issupplied with signals from the loudspeaker 45 and a microphone 48 andfurther controls an LPC unit 49 and a gain calculation unit 50. The DVCunit 47 and the gain calculation unit 50 are further controlled by avolume control 51. The EQ filter 43 is controlled by the LPC unit 49which also controls, via a Dirac pulsed filter 52 and a scalingcalculation unit 53, the scaling unit 42, where the gain adjusting unit44 is controlled by the gain calculation unit 50.

FIG. 11 illustrates another DVC/DEC system. The system of FIG. 11differs from the system of FIG. 10 in that in the system of FIG. 11 thescaling unit 42 is not connected between the gain adjusting unit 46 andthe EQ filter 43 (as in FIG. 10), but between the output of the gaincalculation unit 50 and the control input of the gain adjusting unit 44.

FIG. 12 illustrates an example of a relatively complex DVC/DEC system.The system of FIG. 12 has a microphone path and a reference path. Themicrophone path is established by a microphone 614, an AD converter 620,a low-pass filter 621, a sample rate conversion unit 622, and anA-weighting high-pass filter 623. The reference path is established byan AD converter 628, a low-pass filter 629, a sample rate conversionunit 630, an A-weighting high-pass filter 624, and a delay unit 631. TheAD converter 628 receives a reference signal from an adder unit 615connected to two delay units 616, 617. The delay units 616, 617 are eachsupplied with a signal provided to two loudspeakers 612, 613 viacorresponding amplifying units 618, 619, respectively.

The reference path further comprises an adaptive filter having a finiteimpulse response (FIR) filter core 641 controllable by a Least MeanSquare (LMS) adaptation unit 642. The LMS unit 642 receives an inputsignal from the delay unit 641, an error signal e[n] from a subtractionunit 644, and an adaptation step size μ[n] from a step size calculationunit 645. The step size calculation unit 645 is supplied with: (i) amean source signal Mean X[n] tapped from the sample rate conversion unit630 via an average unit 650, (ii) an estimated system distance Dist[n]supplied by the FIR filter core 641, and (iii) a mean error signal MeanE[n] tapped from the subtraction unit 644 via an average unit 651. Thesubtraction unit 644 further receives via the microphone path a signalfrom the microphone 614 which is part of a loudspeaker-room-microphone(LRM) system (not shown in FIG. 12). The LRM system further comprises aroom having a transfer function H_(Room)(Z), and the two loudspeakers612, 613.

The error signal e[n] is supplied via a scaling unit 659 to a predictionerror filter core 660 serving as a whitening filter. Downstream of thefilter core 660 is an averaging unit 661, a voice activity detector 662,a gain correction unit 663, a maximum gain unit 664, and a compare unit665 evaluating whether the gain is below 1. If the gain is below 1, thegain of an amplifying unit 666 is set to 1, otherwise the gain is leftunchanged. The amplifying unit 666 is supplied with an electrical soundsignal from a sound source (e.g., CD player, cassette player, radio andthe like) and feeds the amplified sound signal into a prediction filtercore 667 controlled by a linear predictive coding unit 658 supplied withthe error signal e[n]. Downstream of the filter core 667 is a scalingunit 668 and an adder unit 669 which also receives the signal from theamplifying unit 666 input into the filter core 667. The output signal ofthe adder unit 669 is processed by a sound signal processing unit 670 ina common way and supplied to loudspeakers 612 and 613.

The gain settings from the compare unit 665 as well as mean volumesettings 671 by the listener are used to set a gain control unit 672 forcontrolling the gain correction unit 665. The mean volume settings 671are further used to control a gain control unit 673 for controlling ascaling unit 674. The scaling unit 674, which controls the scaling unit668, further receives signals from a Dirac pulsed prediction filter 675and the VAD 662. The Dirac pulsed prediction filter 675 controls theprediction error filter 660 by copying its filter coefficients into theprediction error filter 660.

Fundamentally, the reference signal is tapped as much as possible at theend of the signal processing chain to prevent negative effects of theequalizing and dynamic modifications (limiters, compressors, et cetera)on the performance of the adaptive filter. Moreover, it has been assumedthat it would be advantageous to obtain the reference signal by addingall relevant signal paths. However, in practice, not only is the phaseof the reference signal strongly distorted, but also its amplitude, dueto the phase differences between the individual signal paths, some ofwhich are quite significant. This situation necessarily leads to anincorrectly working adaptive filter.

One way of improving the performance is to use a single, broadbandoutput signal for which no addition is performed. This is alsopermissible provided all output signals are strongly correlated witheach other, which is usually the case with audio signals. Unfortunately,broadband output signals of this nature are rarely found in signal flowdiagrams since the output signals are generally adapted to thebandwidths of the connected loudspeakers using crossover filters beforethey are output, and are consequently limited in bandwidth. A moredesirable solution is to use the reference signal generationconfiguration as illustrated in FIG. 13. This configuration makes use ofall correlations when generating the reference signal and dispenses withall signal addition operations.

Yet another technique likewise embodies an alternative method forreference signal generation. In this alternative technique, thereference signal is generated from one or more input signals. Althoughthis technique is in principle undesirable, it has shown itself inpractice to be a simple, yet efficient mechanism of reference signalgeneration. This is true when the entire signal processing between theinput and output signals is considered as a stationary system. In thisrespect, the system is practically part of the LRM system, which can bereproduced by the adaptive filter. In addition to the music, which isnormally available in a stereo signal, any available special channeltypically serves as a reference, if it is not to act as a potentialnoise influence. In doing so, different source signals may be mixedtogether in a particular relationship at the same time. The mixerdetermines the extent to which one signal mixes with another.

Consequently, the output signal of the mixer, which mixes all sourcesignals together, represents a relatively good input signal for thispurpose. The signal amplitude is still a problem though in that it islikewise modified by the mixer. Since the greater the amplitude of thereference signal, the better the adaptive filter works, one technique ofobtaining a continuously well-controlled reference signal (mixer outputsignal) is to scale the reference signal such that it is as fully drivenas possible, regardless of the configured volume. To find the scalingfactor, the maximum value of the sum of all loudness values (volumes) ofthe source signals is determined, and its difference to the fully drivensignal is calculated, which is the equivalent of the scaling factor.Depending on whether the volume values are linear or logarithmic, thefollowing formulae are used to calculate the scaling factor:Logarithmic: Scale_(dB)=0_(dB)−MaxVol_(dB) orLinear: Scale_(Lin)=1/MaxVol_(Lin)The calculated maximum volume is then also used as the referenceloudness value (VOLUME) in the calculation of the anti-gain chasefunction in the DVC/DEC system so that problems due to relatively lowvolumes are avoided from the outset.

FIG. 13 illustrates a model for generating the reference signal andreference volume. In the model illustrated in FIG. 13, a mono signal isgenerated from the stereo music input signals “MusicLeft” and“MusicRight” before being fed into a mixing matrix 60. This is notabsolutely necessary, but it simplifies the mixing matrix 60. Theproblem associated with impure phase addition normally does not occur,or only to a minor degree, in this system as no signal processing hasyet been performed using the two music signals, thereby rendering therisk of reference signal distortion minimal.

Practical experimentation using the DVC system demonstrated that itfunctions in an acceptable manner within a specific noise level range,but that it begins to respond (at least as subjectively perceived) in anexaggerated manner to noise levels above that range. For this reason, again correction (scaling unit 62 in connection with Maximum volumesearch 61 and Difference-to-full-scale calculation unit 63) function maybe integrated to modify the noise level value (“NoiseLevel” in FIG. 15)weighted with the correction factor (“Corr”) deriving from the anti-gainchase function. The weighted level is referred to as the ratio.

FIG. 14 illustrates the characteristic of the gain correction function.FIG. 15 illustrates the implementation of a gain correction function.The gain correction function is controlled by two parameters, namelynoise threshold (“NoiseTH”) and NoiseSlope which is the slope of theright-hand line of FIG. 14. NoiseTH is used to control how a measuredincrease in the background noise level results in a volume increase.Below NoiseTH the ratio is about 1:1 and above NoiseTH the ratio is asNoiseSlope indicates below 1:1. A reason for this may be that, as anoutcome of under-sampling, lower frequency components being usedbasically only for the calculation of the gain factors increase morethan higher frequency components do with an increase of the backgroundnoise. However, as DVC gain is effective in the entire frequency rangeand listeners are most sensitive in the mid-range the system as to beadapted by introducing a gain correction function to address the issueoutlined above. The gain correction function may be incorrect for lowerfrequencies but this incorrectness may be overcome by arranging the DVCgain units downstream from a loudness processing unit. Alternatively,the DEC may be adapted to compensate for the incorrectness.

Referring to the speech activity detection function illustrated in FIG.7, sometimes the period is relatively long until the actual end noiselevel value is reached so that appropriate countermeasures can be taken.The waiting time is due to the low slew rate (“C_INC”) with which thesystem responds to increasing noise levels. The threshold value,“MinNoiseLevel”, is used to prevent the noise level from falling below aspecific level, which in effect is identical to the activation thresholdof the system. However, care must be taken to avoid setting thethreshold too high as otherwise the control function may be activatedalthough the noise level is not high enough.

By deploying the anti-gain chase function, the maximum permissiblecontrol dynamics (“MaxGain”) vary according to the volume. Since theactivation threshold (“MinNoiseLevel”) depends on the maximum gain, itmust likewise be modified according to the volume. This yields acharacteristic for the activation threshold, a straight line whose Yaxis section (“t_MNTH”) varies with the maximum gain (“MaxGain”) and thenoise scaling factor (“Scale”), and whose slope (“m_MNTH”) is correlatedwith the slope of the anti-gain chase function (“m_Corr”). FIG. 16illustrates the characteristics of the activation threshold.

FIG. 17 illustrates an implementation of the MinNoiseTH (activationthreshold) function. The use of an adaptation step μ[n] size has shownitself to be relatively useful in practice. In particular, the negativeeffects of strong, brief noise signals, for example, the human voice canbe reduced to a minimum using the appropriate step size, especially inthe case of transmission of loud, bass source signals since thesecontribute little or nothing in the calculation of the noise level. Theenergy of the error signal, MeanE[n] (FIG. 1), is scaled according tothe amplitude of the reference signal when tuning the adaptation stepsize (the reference signal is driven as much as possible, as describedearlier). The greater the effect of the reference signal in thecalculation of the adaptation step size, the more the energy of theerror signal must be scaled upwards. The converse applies equally.

If the exemplary implemented model illustrated in FIG. 13 for generatingthe reference signal is used, the formula of equation 1 is also correct.However, this is not always the case. Care should be taken that no erroris made since this may cause problems with the entire adaptive filter,which may have worse consequences for the entire DVC/DEC system. It wasalso discovered that the adaptation step size, μ[n], may be limited to aspecific range (μ[n]ε[0, . . . , μ_(Max)[n]]) so that the adaptivefilter can function more reliably. In particular, the maximumpermissible adaptation step size μ_(Max)[n] may not be set to arelatively large value as otherwise there is a risk that the adaptivefilter will become unstable and start to oscillate, which may cause theDVC/DEC system to fail. A value of about 0.01 for μ_(Max)[n] has provenacceptable. This value is not intended to apply in general but mayrather serve as a guideline figure. Its precise maximum value must becalculated for each particular case.

Although various exemplary embodiments of the invention have beendisclosed, it will be apparent to those skilled in the art that variouschanges and modifications can be made which will achieve some of theadvantages of the invention without departing from the spirit and scopeof the invention. It will be obvious to those reasonably skilled in theart that other components performing the same functions may be suitablysubstituted. Further, the methods of the invention may be achieved ineither all software implementations, using the appropriate processorinstructions, or in hybrid implementations that utilize a combination ofhardware logic and software logic to achieve the same results. Suchmodifications to the inventive concept are intended to be covered by theappended claims.

Although an example of this invention has been described hereinabove indetail, it is desired to emphasize that this has been for the purpose ofillustrating the invention and should not be considered as necessarilylimitative of the invention, it being understood that many modificationsand variations can be made by those skilled in the art while stillpracticing the invention claimed herein.

1. An audio enhancement system for compensating for ambient noise in alistening: environment, comprising: audio means for receiving anelectrical sound signal and generating a sound output from theelectrical sound signal; sensor means for obtaining a total sound signalrepresentative of the total sound level in the listening environment,where the total sound level comprises both the sound output from theaudio means and the ambient noise within the listening environment;extraction means responsive to the total sound signal and to a referencesignal derived from the electrical sound signal, for extracting anambient noise signal representative of the ambient noise in theenvironment from the total sound signal, where the extraction meanscomprises an adaptive filter with an adaptive step size; control meansresponsive to the ambient noise signal for generating a control signal;equalizer means responsive to the control signal for adjusting the soundoutput of the audio means to compensate for the ambient noise level; anda step size calculation unit that controls the adaptive step size of theadaptive filter.
 2. The audio enhancement system of claim 1, where thestep size calculation unit controls the adaptive step size in responseto a mean reference signal, an estimated system distance, and a meanerror signal.
 3. The audio enhancement system of claim 2, where theadaptive step size is equal to the product of the mean reference signaland the estimated system distance, divided by the mean error signal. 4.The audio enhancement system of claim 1, where the adaptive filtercomprises a filter core and a least mean square adaptation unit forcontrolling the filter core, and where the LMS adaptation unit iscontrolled by the step size calculation unit.
 5. The audio enhancementsystem of claim 1, further comprising a delay for delaying either thetotal sound signal or the electrical sound signal supplied to theextraction means.
 6. The audio enhancement system of claim 1, where theextraction means comprises a total sound signal path having a low-passfilter and a sample rate conversion unit for processing the total soundsignal.
 7. The audio enhancement system of claim 1, where the extractionmeans comprises a reference signal path having a low-pass filter and asample rate conversion unit for processing the reference signal.
 8. Theaudio enhancement system of claim 1, where the audio means comprisesmeans for producing at least two electrical sound signals and forgenerating at least two sound outputs from the electrical sound signals,where the at least two electrical sound signals are processed in areference signal means for calculating to generate a single referencesignal.
 9. The audio enhancement system of claim 8, where the means forprocessing the electrical sound signals includes attenuating and/ordelaying of each electrical sound signal and summing of the attenuatedand/or delayed electrical sound signals.
 10. The audio enhancementsystem of claim 1, where the audio means produces at least twoelectrical sound signals and generates at least two sound outputs fromthe electrical sound signals, where the at least two electrical soundsignals each form a reference signal supplied to the extraction means.11. The audio enhancement system of claim 10, where the extraction meanscomprises for each reference signal a reference signal path having alow-pass filter and a sample rate conversion unit for processing thereference signal.
 12. The audio enhancement system of claim 11, wherethe extraction means comprises for each reference signal path anadaptive filter, where output signals of each adaptive filter aresupplied to the subsequent adaptive filter as a total sound signal. 13.The audio enhancement system of claim 1, where the audio means producesat least three electrical sound signals and generates at least threesound outputs from the electrical sound signals, where one of the atleast three electrical sound signals forms a first reference signalsupplied to the extraction means and the others of the at least threeelectrical sound signals are processed to form a second referencesignal.
 14. The audio enhancement system of claim 13, where the otherelectrical sound signals are processed in a reference signal calculationarrangement to generate a single reference signal.
 15. The audioenhancement system of claim 14, where the means for processing theelectrical sound signals includes means for attenuating and/or delayingof each electrical sound signal and summing of the attenuated and/ordelayed electrical sound signals.
 16. The audio enhancement system ofclaim 1, further comprising a predictive error filter connecteddownstream from the extraction means.
 17. The audio enhancement systemof claim 16, further comprising a linear predictive coding unit thatcontrols the predictive error filter.
 18. The audio enhancement systemof claim 1, where each reference signal is delayed by the delay means.19. The audio enhancement system of claim 1, further comprising a voiceactivity detector.
 20. The audio enhancement system of claim 19, wherethe voice activity detector examines whether the current noise value isgreater than the previous noise level and if greater, a new noise levelis calculated from the previous noise level such that the noise levelincreases by a predetermined increment, regardless of how much thecurrent noise signal value is greater than the previous noise level. 21.The audio enhancement system of claim 1, further comprising a pluralityof audio transducers equal or greater than the number of sound outputsgenerated from the respective electrical sound signals, where each ofthe audio transducers provides reference signals.
 22. The audioenhancement system of claim 21, where the sound outputs are generated byloudspeakers and each of the audio transducers is arranged adjacent tothe respective loudspeaker.
 23. The audio enhancement system of claim 1,further comprising an amplifier connected downstream of the equalizermeans for amplifying the output signal of the equalizer means.
 24. Theaudio enhancement system of claim 23, where the amplifier is controlledby a gain control means.
 25. The audio enhancement system of claim 1,further comprising a scaler circuit for scaling the electrical soundsignal provided to the equalizer means.
 26. The audio enhancement systemof claim 1, further comprising a scaler connected between the gaincontrol means and the amplifier.
 27. The audio enhancement system ofclaim 25, where the scaler is controlled by a scaling control means forcontrolling the scaling amount.
 28. The audio enhancement system ofclaim 27, where the scaler comprises a Dirac filter and a scalingcalculation unit.
 29. The audio enhancement system of claim 23, wherethe gain control means comprises a noise threshold value to prevent thenoise level from falling below a specific level.
 30. The audioenhancement system of claim 29, where the noise threshold value isdependent on a volume setting.
 31. A method for enhancing the soundsignal produced by an audio system in a listening environment bycompensating for ambient noise in a listening environment, the methodcomprising the steps of: receiving an electrical sound signal andgenerating a sound output from the electrical sound signal; obtaining atotal sound signal representative of the total sound level in theenvironment, where the total sound level comprises both the sound outputfrom the audio system and the ambient noise within the environment;extracting an ambient noise signal representative of the ambient noisein the environment from the total sound signal in response to the totalsound signal and to a reference signal derived from the electrical soundsignal, where the step of extracting includes the step of adaptivefiltering with an adaptive step size; generating a control signal inresponse to the ambient noise signal; adjusting the sound output of theaudio system to compensate for the ambient noise level in response tothe control signal; and controlling the adaptive step size bycalculating a step size.
 32. The method of claim 31, where the step ofcalculating the step size processes a mean reference signal, anestimated system distance, and a mean error signal.
 33. The method ofclaim 32, where the step size is equal to the product of the meanreference signal and the estimated system distance, divided by the meanerror signal.
 34. The method of claim 31, where the step of adaptivefiltering comprises a least mean square (LMS) adaptation step, where theLMS adaptation step is controlled by the step size calculation step. 35.The method of claim 31, further comprising the step of delaying eitherthe total sound signal or the electrical sound signal provided to theextracting step.
 36. The method of claim 31, where the extracting stepcomprises the steps of low-pass filtering and sample rate converting forprocessing the total sound signal.
 37. The method of claim 31, where theextracting step comprises the steps of low-pass filtering and samplerate converting for processing the reference signal.
 38. The method ofclaim 31, where at least two electrical sound signals are received andat least two sound outputs from the electrical sound signals aregenerated, where the at least two electrical sound signals are processedto generate a single reference signal.
 39. The method of claim 38, wherethe processing of the electrical sound signals includes the steps ofattenuating and/or delaying of each electrical sound signal and summingof the attenuated and/or delayed electrical sound signals.
 40. Themethod of claim 31, further comprising the steps of receiving at leasttwo electrical sound signals and generating at least two sound outputsfrom the electrical sound signals, where the at least two electricalsound signals each form a reference signal that is processed in the stepof extracting.
 41. The method of claim 40, where the extracting stepcomprises for each reference signal the steps of low-pass filtering andsample rate converting for processing the reference signals.
 42. Themethod of claim 41, where the extracting step comprises for eachreference signal the steps of adaptive filtering and supplying theoutput signals of each adaptive filter to the subsequent adaptive filteras total sound signal.
 43. The method of claim 31, where at least threeelectrical sound signals are received and at least three sound outputsare generated from the electrical sound signals, where one of the atleast three electrical sound signals forms one reference and the othersof the at least three electrical sound signals form other referencesignals.
 44. The method of claim 43, where the others of the at leastthree electrical sound signals are processed in a reference signalcalculation arrangement to generate a single reference signal.
 45. Themethod of claim 44, where the processing of the electrical sound signalsincludes the steps of attenuating and/or delaying of each electricalsound signal and summing of the attenuated and/or delayed electricalsound signals.
 46. The method of claim 31, further comprising the stepof predictive error filtering.
 47. The method of claim 46, furthercomprising the step of linear predictive coding for controlling thepredictive error filtering.
 48. The method of claim 31, furthercomprising the step of delaying each reference signal.
 49. The method ofclaim 31, further comprising the step of voice activity detecting. 50.The method of claim 49, further comprising the step of determiningwhether the current noise value is greater than the previous noise leveland if greater, a new noise level is calculated from the previous noiselevel such that the noise level increases by a predetermined increment,regardless of how much the current noise signal value is greater thanthe previous noise level.
 51. The method of claim 31, further comprisingthe step of generating a plurality of total sound signals, where thenumber of sound signals is equal to the number sound outputs generatedfrom the respective electrical sound signals, and where the total soundsignals each form reference signals.
 52. The method of claim 51, wherethe total sound signals each form reference signals.
 53. The method ofclaim 31, where the step of adjusting comprises amplifying.
 54. Themethod of claim 53, where the amplifying step is controlled by a gaincontrol step.
 55. The method of claim 31, further comprises scaling theelectrical sound signal provided to the equalizer step.
 56. The methodof claim 54, further comprising a scaling step between the gain controlstep and the amplifying step.
 57. The method of claim 55, where thescaling step is controlled by a scaling control step.
 58. The method ofclaim 57, where the scaling control step comprises the step of Diracfiltering and scaling.
 59. The audio enhancement system of claim 53,where the gain control step comprises a noise threshold value to preventthe noise level from falling below a specific level.
 60. The audioenhancement system of claim 59, where the noise threshold value isdependent on a volume setting.